markov decision process
Fast Bellman Updates for Wasserstein Distributionally Robust MDPs
Markov decision processes (MDPs) often suffer from the sensitivity issue under model ambiguity. In recent years, robust MDPs have emerged as an effective framework to overcome this challenge. Distributionally robust MDPs extend the robust MDP framework by incorporating distributional information of the uncertain model parameters to alleviate the conservative nature of robust MDPs.
- Asia > China > Hong Kong (0.04)
- North America > United States > Massachusetts (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Virginia (0.04)
- North America > United States > Pennsylvania (0.04)
- (4 more...)
- North America > United States (0.06)
- North America > Canada > Quebec > Montreal (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.41)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > New York > Suffolk County > Stony Brook (0.04)
- North America > United States > New Hampshire (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.46)
- North America > United States > New York > Suffolk County > Stony Brook (0.04)
- North America > United States > New Hampshire (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.46)
- Europe > Switzerland > Basel-City > Basel (0.04)
- Asia > Middle East > Republic of Türkiye (0.04)
- Asia > China (0.04)
- Europe > Switzerland > Basel-City > Basel (0.04)
- North America > United States > New Hampshire (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (3 more...)
- Information Technology (0.87)
- Health & Medicine > Therapeutic Area > Immunology (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.86)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.51)
- Asia > Middle East > Jordan (0.04)
- Oceania > New Zealand (0.04)
- North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
- (2 more...)
- Transportation > Ground > Road (0.94)
- Transportation > Electric Vehicle (0.94)
- Automobiles & Trucks (0.94)
Sample Complexity of Average-Reward Q-Learning: From Single-agent to Federated Reinforcement Learning
Jiao, Yuchen, Woo, Jiin, Li, Gen, Joshi, Gauri, Chi, Yuejie
Average-reward reinforcement learning offers a principled framework for long-term decision-making by maximizing the mean reward per time step. Although Q-learning is a widely used model-free algorithm with established sample complexity in discounted and finite-horizon Markov decision processes (MDPs), its theoretical guarantees for average-reward settings remain limited. This work studies a simple but effective Q-learning algorithm for average-reward MDPs with finite state and action spaces under the weakly communicating assumption, covering both single-agent and federated scenarios. For the single-agent case, we show that Q-learning with carefully chosen parameters achieves sample complexity $\widetilde{O}\left(\frac{|\mathcal{S}||\mathcal{A}|\|h^{\star}\|_{\mathsf{sp}}^3}{\varepsilon^3}\right)$, where $\|h^{\star}\|_{\mathsf{sp}}$ is the span norm of the bias function, improving previous results by at least a factor of $\frac{\|h^{\star}\|_{\mathsf{sp}}^2}{\varepsilon^2}$. In the federated setting with $M$ agents, we prove that collaboration reduces the per-agent sample complexity to $\widetilde{O}\left(\frac{|\mathcal{S}||\mathcal{A}|\|h^{\star}\|_{\mathsf{sp}}^3}{M\varepsilon^3}\right)$, with only $\widetilde{O}\left(\frac{\|h^{\star}\|_{\mathsf{sp}}}{\varepsilon}\right)$ communication rounds required. These results establish the first federated Q-learning algorithm for average-reward MDPs, with provable efficiency in both sample and communication complexity.
- Asia > China > Hong Kong (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)